Goto

Collaborating Authors

 gauge transformation



Computing frustration and near-monotonicity in deep neural networks

Wendin, Joel, Larsson, Erik G., Altafini, Claudio

arXiv.org Machine Learning

For the signed graph associated to a deep neural network, one can compute the frustration level, i.e., test how close or distant the graph is to structural balance. For all the pretrained deep convolutional neural networks we consider, we find that the frustration is always less than expected from null models. From a statistical physics point of view, and in particular in reference to an Ising spin glass model, the reduced frustration indicates that the amount of disorder encoded in the network is less than in the null models. From a functional point of view, low frustration (i.e., proximity to structural balance) means that the function representing the network behaves near-monotonically, i.e., more similarly to a monotone function than in the null models. Evidence of near-monotonic behavior along the partial order determined by frustration is observed for all networks we consider. This confirms that the class of deep convolutional neural networks tends to have a more ordered behavior than expected from null models, and suggests a novel form of implicit regularization.



Transformer models are gauge invariant: A mathematical connection between AI and particle physics

van Nierop, Leo

arXiv.org Artificial Intelligence

In particle physics, the fundamental forces are subject to symmetries called gauge invariance. It is a redundancy in the mathematical description of any physical system. In this article I will demonstrate that the transformer architecture exhibits the same properties, and show that the default representation of transformers has partially, but not fully removed the gauge invariance.


Reviews: Gauging Variational Inference

Neural Information Processing Systems

The paper presents two new approaches to variational inference for graphical models with binary variables - an important fundamental problem. By adding a search over gauge transformations (which preserve the partition function) to the standard variational search over (pseudo-)marginals, the authors present methods called gauged mean field (G-MF, Alg 1) and gauged belief propagation (G-BP, Alg 2). Very helpfully, both are guaranteed to yield a lower bound on the true partition function - this may be important in practice and enables empirical testing for improved performance even for large, intractable models (since higher lower bounds are better). Initial experiments demonstrate the benefits of the new methods. The authors demonstrate technical skill and good background knowledge.


Performance-guaranteed regularization in maximum likelihood method: Gauge symmetry in Kullback -- Leibler divergence

Ichiki, Akihisa

arXiv.org Machine Learning

The maximum likelihood method is the best-known method for estimating the probabilities behind the data. However, the conventional method obtains the probability model closest to the empirical distribution, resulting in overfitting. Then regularization methods prevent the model from being excessively close to the wrong probability, but little is known systematically about their performance. The idea of regularization is similar to error-correcting codes, which obtain optimal decoding by mixing suboptimal solutions with an incorrectly received code. The optimal decoding in error-correcting codes is achieved based on gauge symmetry. We propose a theoretically guaranteed regularization in the maximum likelihood method by focusing on a gauge symmetry in Kullback -- Leibler divergence. In our approach, we obtain the optimal model without the need to search for hyperparameters frequently appearing in regularization.


Stabilizing the Maximal Entropy Moment Method for Rarefied Gas Dynamics at Single-Precision

Zheng, Candi, Yang, Wang, Chen, Shiyi

arXiv.org Artificial Intelligence

Developing extended hydrodynamics equations valid for both dense and rarefied gases remains a great challenge. A systematical solution for this challenge is the moment method describing both dense and rarefied gas behaviors with moments of gas molecule velocity distributions. Among moment methods, the maximal entropy moment method (MEM) stands out for its well-posedness and stability, which utilizes velocity distributions with maximized entropy. However, finding such distributions requires solving an ill-conditioned and computation-demanding optimization problem. This problem causes numerical overflow and breakdown when the numerical precision is insufficient, especially for flows like high-speed shock waves. It also prevents modern GPUs from accelerating optimization with their enormous single floating-point precision computation power. This paper aims to stabilize MEM, making it practical for simulating very strong normal shock waves on modern GPUs at single precision. We propose the gauge transformations for MEM, making the optimization less ill-conditioned. We also tackle numerical overflow and breakdown by adopting the canonical form of distribution and Newton's modified optimization method. With these techniques, we achieved a single-precision GPU simulation of a Mach 10 shock wave with 35 moments MEM, surpassing the previous double-precision results of Mach 4. Moreover, we argued that over-refined spatial mesh degrades both the accuracy and stability of MEM. Overall, this paper makes the maximal entropy moment method practical for simulating very strong normal shock waves on modern GPUs at single-precision, with significant stability improvement compared to previous methods.


Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and Applications

Cortes, Gianfranco, Yu, Yue, Chen, Robin, Armstrong, Melissa, Vaillancourt, David, Vemuri, Baba C.

arXiv.org Artificial Intelligence

With the advent of group equivariant convolutions in deep networks literature, spherical CNNs with $\mathsf{SO}(3)$-equivariant layers have been developed to cope with data that are samples of signals on the sphere $S^2$. One can implicitly obtain $\mathsf{SO}(3)$-equivariant convolutions on $S^2$ with significant efficiency gains by explicitly requiring gauge equivariance w.r.t. $\mathsf{SO}(2)$. In this paper, we build on this fact by introducing a higher order generalization of the gauge equivariant convolution, whose implementation is dubbed a gauge equivariant Volterra network (GEVNet). This allows us to model spatially extended nonlinear interactions within a given receptive field while still maintaining equivariance to global isometries. We prove theoretical results regarding the equivariance and construction of higher order gauge equivariant convolutions. Then, we empirically demonstrate the parameter efficiency of our model, first on computer vision benchmark data (e.g. spherical MNIST), and then in combination with a convolutional kernel network (CKN) on neuroimaging data. In the neuroimaging data experiments, the resulting two-part architecture (CKN + GEVNet) is used to automatically discriminate between patients with Lewy Body Disease (DLB), Alzheimer's Disease (AD) and Parkinson's Disease (PD) from diffusion magnetic resonance images (dMRI). The GEVNet extracts micro-architectural features within each voxel, while the CKN extracts macro-architectural features across voxels. This compound architecture is uniquely poised to exploit the intra- and inter-voxel information contained in the dMRI data, leading to improved performance over the classification results obtained from either of the individual components.


Geometrical aspects of lattice gauge equivariant convolutional neural networks

Aronsson, Jimmy, Müller, David I., Schuh, Daniel

arXiv.org Artificial Intelligence

Lattice gauge equivariant convolutional neural networks (L-CNNs) are a framework for convolutional neural networks that can be applied to non-Abelian lattice gauge theories without violating gauge symmetry. We demonstrate how L-CNNs can be equipped with global group equivariance. This allows us to extend the formulation to be equivariant not just under translations but under global lattice symmetries such as rotations and reflections. Additionally, we provide a geometric formulation of L-CNNs and show how convolutions in L-CNNs arise as a special case of gauge equivariant neural networks on SU($N$) principal bundles.


Equivariant Mesh Attention Networks

Basu, Sourya, Gallego-Posada, Jose, Viganò, Francesco, Rowbottom, James, Cohen, Taco

arXiv.org Artificial Intelligence

Equivariance to symmetries has proven to be a powerful inductive bias in deep learning research. Recent works on mesh processing have concentrated on various kinds of natural symmetries, including translations, rotations, scaling, node permutations, and gauge transformations. To date, no existing architecture is equivariant to all of these transformations. In this paper, we present an attention-based architecture for mesh data that is provably equivariant to all transformations mentioned above. Our pipeline relies on the use of relative tangential features: a simple, effective, equivariance-friendly alternative to raw node positions as inputs. Experiments on the FAUST and TOSCA datasets confirm that our proposed architecture achieves improved performance on these benchmarks and is indeed equivariant, and therefore robust, to a wide variety of local/global transformations.